Tuning an Existing Nomenclature for Specific Domain Corpora: A Syntax-Based Similarity Method
نویسندگان
چکیده
There is a constant need to extend and tune medical vocabularies to account for new words and new word usages. Robust natural language processing (NLP) tools can be applied to medical texts corpora such as patient narratives and help collect and analyze unknown words1,2. The aim of the present work is to assess the potential for classifying unknown words based on the semantic categories of “neighbors” identified through syntactic distributional properties3.
منابع مشابه
An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation
In this paper, we propose a novel domain adaptation method named “mixed fine tuning” for neural machine translation (NMT). We combine two existing approaches namely fine tuning and multi domain NMT. We first train an NMT model on an out-of-domain parallel corpus, and then fine tune it on a parallel corpus which is a mix of the in-domain and out-ofdomain corpora. All corpora are augmented with a...
متن کاملMeasures of semantic similarity and relatedness in the biomedical domain
Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concep...
متن کاملExploitation of semantic similarity for adaptation of existing terminologies within biomedical area
We present a novel method for adaptation of existing terminologies. Within biomedical domain and when no textual corpora for building terminologies are available, we exploit UMLS metathesaurus which merges over a hundred existing biomedical terminologies and ontologies. We exploit also algorithms for measuring the semantic similarity in order to limit, within UMLS, a semantically homogeneous sp...
متن کاملNLPCC 2016 Shared Task Chinese Words Similarity Measure via Ensemble Learning Based on Multiple Resources
Many Chinese words similarity measure algorithms have been introduced since it’s a fundamental issue in various tasks of natural language processing. Previous work focused mainly on using existing semantic knowledge bases or large-scale corpora. However, knowledge base and corpus have limitations for broad coverage and data update. Thus, ensemble learning is then used to improve performance by ...
متن کاملWeb-Based Semantic Similarity: An Evaluation in the Biomedical Domain
Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the structured knowledge available in domain ontologies (such as SNOMED-CT or MeSH) and specific, closed and reliable corpora (such as clinical data). However, in r...
متن کامل